-
Notifications
You must be signed in to change notification settings - Fork 109
add support to store packages/archives locally #1685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add support to store packages/archives locally #1685
Conversation
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
@VarshaUN Thanks for your work on this PR, it's a great start! That said, it currently lacks some important context. When submitting a PR, try to assume that the reviewer isn’t familiar with anything outside of what’s included here (like your proposal or earlier discussions). To improve this, please start by clearly documenting:
This will not only help with the current review process but will also serve as a strong foundation for future documentation improvements. It would also be helpful to include:
Keep up the good work—looking forward to seeing this evolve! |
Sure , I have mentioned it above , if needed I could share my Proposal for the context :) |
Signed-off-by: Varsha U N <varshaun58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VarshaUN thanks for the initial PR!
Please see my comments for your consideration. Presently your tests are failing, and we can't configure locally your branch so I cannot do a detailed review on the functionality added by you yet. Please fix this so we can proceed towards that.
Could you also add some details of what you're trying to achieve in #1683 for better context as @tdruez has already requested?
Thanks, looks like a good start otherwise.
@@ -0,0 +1,28 @@ | |||
# Generated by Django 5.1.1 on 2025-05-26 09:19 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please merge your migrations into one file, since they are for the same fields?
|
||
from django.db import migrations, models | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please merge your migrations into one file, since they are for the same fields?
You can always generate multiple migration files as you go, but it's best to merge them and reorder based on merged branches before finally merging itself.
('download_date', models.DateTimeField(auto_now_add=True, help_text='Date when the package was downloaded or added.')), | ||
('scan_log', models.TextField(blank=True, help_text='Log output from scanning the package.')), | ||
('scan_date', models.DateTimeField(blank=True, help_text='Date when the package was scanned.', null=True)), | ||
('project', models.ForeignKey(editable=False, on_delete=django.db.models.deletion.CASCADE, related_name='downloadedpackages', to='scanpipe.project')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to put projects on_delete
like this here.
('download_date', models.DateTimeField(auto_now_add=True, help_text='Date when the package was downloaded or added.')), | ||
('scan_log', models.TextField(blank=True, help_text='Log output from scanning the package.')), | ||
('scan_date', models.DateTimeField(blank=True, help_text='Date when the package was scanned.', null=True)), | ||
('project', models.ForeignKey(editable=False, on_delete=django.db.models.deletion.CASCADE, related_name='downloadedpackages', to='scanpipe.project')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think how we are going to handle same package archive used in two different projects using different pipelines, or done with different SCIO versions.
Example, same package archive is scanned with inspect_packages
and scan_sing;e_package
Additionally, we need to look into having a help text show up with projects which were run on the same package.
Consider this when you build the models, but we can also update them later as this is preliminary anyway.
@@ -50,6 +55,7 @@ def steps(cls): | |||
cls.flag_empty_files, | |||
cls.flag_ignored_resources, | |||
cls.scan_for_application_packages, | |||
cls.store_package_archives, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get what you are doing here.
We need to do this processing potentially before we even download the inputs so that we don't re-download these?
- check for the URL in the index, use that if present
- download and compute checksum, check in the checksum index, and use that if present
Additionally we also need to consider if we want to rescan again based on if we have a newer version of scancode or we are running different pipelines.
You presently are running this much after we download and scan things.
Signed-off-by: Varsha U N <varshaun58@gmail.com>
…-org#1686) * Upgrade packageurl-python to latest version aboutcode-org#1383 Signed-off-by: tdruez <tdruez@nexb.com> * Add make_mock_response to simplify setup in unit test aboutcode-org#1383 Signed-off-by: tdruez <tdruez@nexb.com> * Add support for fetching Package URLs (fetch_package_url) aboutcode-org#1383 Signed-off-by: tdruez <tdruez@nexb.com> * Add Package URL placeholder in InputsBaseForm aboutcode-org#1383 Signed-off-by: tdruez <tdruez@nexb.com> * Add CHANGELOG entry aboutcode-org#1383 Signed-off-by: tdruez <tdruez@nexb.com> --------- Signed-off-by: tdruez <tdruez@nexb.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: tdruez <tdruez@nexb.com>
…ent command (aboutcode-org#1690) Signed-off-by: tdruez <tdruez@nexb.com>
* Upgrade Ace library to latest version 1.42.0 Signed-off-by: tdruez <tdruez@nexb.com> * Upgrade Bulma CSS library to latest version 1.0.4 Signed-off-by: tdruez <tdruez@nexb.com> * Refine the CSS for the Resource viewer Signed-off-by: tdruez <tdruez@nexb.com> --------- Signed-off-by: tdruez <tdruez@nexb.com>
…1693) * Display matched snippets details in "Resource viewer" aboutcode-org#1688 Signed-off-by: tdruez <tdruez@nexb.com> * Remove print statements used for debugging aboutcode-org#1688 Signed-off-by: tdruez <tdruez@nexb.com> --------- Signed-off-by: tdruez <tdruez@nexb.com>
aboutcode-org#1694) In preparation of adding parent_path as a field aboutcode-org#1691 Signed-off-by: tdruez <tdruez@nexb.com>
Signed-off-by: tdruez <tdruez@nexb.com>
…rg#1697 (aboutcode-org#1698) Signed-off-by: tdruez <tdruez@nexb.com>
Signed-off-by: tdruez <tdruez@nexb.com>
…de-org#1699) Signed-off-by: tdruez <tdruez@nexb.com>
…1699 Signed-off-by: tdruez <tdruez@nexb.com>
* Add d2d symbols matching for winpe macho binaries Reference: aboutcode-org#1431 Reference: aboutcode-org#1432 Reference: aboutcode-org#1433 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com> * Use newly released source-inspector v0.6.0 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com> * Bump binary-inspector to v0.1.2 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com> * Add test as examples for macho/winpe symbol matching Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com> --------- Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: tdruez <tdruez@nexb.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Co-authored-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
Signed-off-by: Varsha U N <varshaun58@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Here are the results from the review.
|
||
# Package storage settings | ||
|
||
ENABLE_LOCAL_PACKAGE_STORAGE = env.bool("ENABLE_LOCAL_PACKAGE_STORAGE", default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our review session what about using this instead?
ENABLE_DOWNLOAD_ARCHIVING = env.bool("ENABLE_DOWNLOAD_ARCHIVING", default=False)
#localstorage, s3
DOWNLOAD_ARCHIVING_PROVIDER = env.str("DOWNLOAD_ARCHIVING_PROVIDER", default=None)
# for local storage we would store the root path in that setting
DOWNLOAD_ARCHIVING_PROVIDER_CONFIGURATION = env.dict("DOWNLOAD_ARCHIVING_PROVIDER_CONFIGURATION", default=None)
@@ -54,6 +55,8 @@ | |||
path("", RedirectView.as_view(url="project/")), | |||
] | |||
|
|||
urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use media for our storage, instead we are using our own thing.
@@ -178,6 +179,12 @@ def __init__(self, *args, **kwargs): | |||
pipeline_choices = scanpipe_app.get_pipeline_choices(include_addon=False) | |||
self.fields["pipeline"].choices = pipeline_choices | |||
|
|||
self.fields["use_local_storage"].label = "Store packages locally" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave forms for later
@@ -585,6 +585,7 @@ class Project(UUIDPKModel, ExtraDataFieldMixin, UpdateMixin, models.Model): | |||
) | |||
notes = models.TextField(blank=True) | |||
settings = models.JSONField(default=dict, blank=True) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the latest review, we do not need models yet.
@@ -20,9 +20,15 @@ | |||
# ScanCode.io is a free software code scanning tool from nexB Inc. and others. | |||
# Visit https://github.com/aboutcode-org/scancode.io for support and download. | |||
|
|||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use instead a modification of the super class, and not a modification to each of the pipelines.
@@ -20,30 +20,36 @@ | |||
# ScanCode.io is a free software code scanning tool from nexB Inc. and others. | |||
# Visit https://github.com/aboutcode-org/scancode.io for support and download. | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not change this for now. Instead create a new archiving.py
module where you can import selectively code from fetch.py
@@ -57,6 +57,11 @@ <h2 class="subtitle mb-0 mb-4"> | |||
<label class="label" for="{{ form.pipeline.id_for_label }}"> | |||
{{ form.pipeline.label }} | |||
</label> | |||
<label class="checkbox" for="{{ form.use_local_storage.id_for_label }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this. No forms just yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not commit a binary for your tests. A small one line text file will just work as well.
@@ -38,6 +40,7 @@ | |||
from django.conf import settings | |||
from django.contrib.auth import get_user_model | |||
from django.core.exceptions import ValidationError | |||
from django.core.files import File |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the tests to something that tests putting and getting the files instead in test_archiving.py
@@ -0,0 +1,93 @@ | |||
# Generated by Django 5.1.1 on 2025-07-09 | |||
|
|||
import django.db.models.deletion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No model change so no migrations.
Here are the notes from our review call: The overall design idea would be to have small JSON files stored side-by-side with the archived inputs. The content would be stored by hash like today, and we would have some extra metadata file, one for each origin (download URL, date and filename). We could have a simple base class to get/put files in the archive and a local file system implementation for now, enable with a global settings.
The files would be stored with a Major by checksum, and with immutable metadata. The local storage looks like this:
or an example:
Next step: where and when to store/archive downloads?
How to compute an origin file path?
|
Fixes #1683
Adds support for storing scanned packages/archives locally if checked .
Signed-off-by : Varsha U N varshaun58@gmail.com